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Abstract 

This paper addresses the design of a dedicated homophonic coding for a class of communication 
systems which, in order to provide both reliability and security, first encode the data before encrypting 
it, which is referred to as the encoding-encryption paradigm. The considered systems employ error- 
correction coding for reliability, a stream cipher for encryption, and homophonic coding to enhance the 
protection of the key used in the stream cipher, on which relies the security of all the system transmissions. 

This paper presents a security evaluation of such systems from a computational complexity point of 
view, which serves as a source for establishing dedicated homophonic code design criteria. The security 
evaluation shows that the computational complexity of recovering the secret key, given all the information 
an attacker could gather during passive attacks he can mount, is lower bounded by the complexity of the 
related LPN (Learning Parity in Noise) problem in both the average and worst case. This gives guide- 
lines to construct a dedicated homophonic encoder which maximizes the complexity of the underlying 
LPN problem for a given encoding overhead. Finally, this paper proposes a generic homophonic coding 
strategy that fulfills the proposed design criteria and thus both enhances security while minimizing the 
induced overhead. 

Keywords: (wireless) communications systems, homophonic coding, error-correction coding, stream ci- 
phers, randomness, security evaluation, LPN problem. 

1 Introduction 

Most communication systems take into account not only the reliability but also the security of the data 
they transmit. This is particularly true in wireless environments, where the data is inherently more sensible 
to security threats. Consequently, the system design needs to include both coding schemes for providing 
error-correction and ciphering algorithms for encryption-decryption. It is common practice to first encrypt 
the data to ensure the security, and then to encode for reliability purposes. In this paper, we focus on those 
communication systems which adopt the reverse approach: they first encode the data, and then encrypt 
it, which we call the encoding- encryption paradigm. A famous illustrative example is the most widespread 
standard for mobile telephony GSM, standing for "Global System for Mobile Communications" (see W and 
[5], for the coding, respectively security details). 

In the considered system, encryption-decryption is done through a stream cipher, since the receiver 
needs to first decrypt the data despite the noise, before performing the decoding, which cannot be done 
through block ciphers. Consequently the security of the system relies crucially on the private key used in the 
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stream cipher and thus when we refer to the security of systems using the encoding-encryption paradigm, 
we imphcitly mean the security of the keystream generator and the users' secret key. 

Motivation for the Work. Homophonic coding (see 'B[ and [S]) is a natural technique to enhance the security 
of systems employing the encoding-encryption paradigm, since it injects extra randomness in the system, 
which increases the confusion of a possible adversary by amplifying the channel noise that he experiences. 
This idea has been recently exploited in [11] and |12j , where it was shown from an information-theoretic point 
of view that with the aid of a dedicated homophonic encoder, the amount of uncertainty that the adversary 
must face about the secret key given all the information he could gather during different passive or active 
attacks he can mount, is a decreasing function of the samples available for cryptanalysis. This means that 
there is a threshold before which the homophonic encoding indeed provides a certain level of unconditional 
security, but after a large sample is collected, the uncertainty tends to zero, entering a regime in which a 
computational security analysis is needed for estimating the resistance against the secret key recovery. This 
paper addresses this computational complexity security evaluation, and highlights how the computational 
security is related to the homophonic coding design. 

Summary of the Results. This paper proposes a homophonic code design derived from a security evaluation of 
the secret key recovery which shows that the security of systems using the encoding-encryption approach can 
be related to the complexity of solving certain LPN (Learning Parity with Noise) problems. It follows from 
this analysis that the dedicated homophonic encoding plays a role in securing the system, and that a careful 
design makes the underlying LPN problem heavily more complex in the average case. This gives guidelines for 
the design of a dedicated homophonic encoder, that comprises five conditions, two related to the information 
theoretical security, two regarding the computational security, and one concerning implementation costs. We 
finally propose a homophonic coding strategy which fulfills all the given criteria. 

Organization of the Paper. Section 2 introduces the background for the problem addressed in this paper. 
Section 3 contains the security evaluation from a computational complexity point of view. Implications of 
the security evaluation on the design of a dedicated homophonic coding are discussed in Section 4, where the 
code design criteria are established, while the code constructions are given in Section 5. Concluding remarks 
including some directions for future work are given in Section [6l 

2 Background 

We consider a class of communication systems which, in order to provide both reliability and security, 
employs the encoding-encryption paradigm, namely: the message is first encoded employing error-correction 
coding for the purpose of reliability and then encrypted using a stream ciphering. It has been shown in 
[11] and |12j via an information-theoretic analysis that the use of a dedicated homophonic/wiretap encoding 
enhances the security of such systems. This section summarizes the reported design and the results of its 
information-theoretic security evaluation. 

2.1 System Model 

In systems employing the encoding-encryption paradigm (as shown in Figure [1}, a stream cipher is used, on 
which relies the security of the whole system. It is thus crucial to focus on the security of its private key. 
The system reported and analyzed in [11] and |12) aims at increasing the security of the private key against 
both passive and active attacks by introducing a dedicated homophonic/ wire-tap encoder which involves 
extra randomness as follows. Let k be the private key, let Ch{') denote a homophonic encoder, added at 
the transmitter, and let 
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Figure 1: Considered communication system model including homophonic coding. 



be a vector of pure randomness^ where each Ui is the realization of a random variable Ui with distribution 
Pr(J7i = 1) = Pr(J7i = 0) = 1/2. The homophonic encoder is positioned before the error-correcting encoder 
Cecc{')^ thus out of the m bits of data to be sent, m — I are replaced by random data, letting actually only 
I bits 

a - hlLi e {0,1}' 

of plaintext, to get 

y = y(k) = Ci5cc(C^//(a||u))©x (1) 

as codeword to be sent, where x — x(k) is the output of the keystream generator. We assume that both 
homophonic and error-correction encoding are linear operations, so that 

Cjf(a||u) = [a||u]Gif, (2) 

where Gh is an m x m matrix, and thus 

CECc{CHiM\\i)) = Cbcc ([a| I u]Gh) 
= [a||u]GHG£;cc 

= [a||u]G (3) 

where Gecc is an to x n binary generator matrix corresponding to Cecc{'), and G = GhGecc is an 
m X n binary matrix summarizing the two successive encodings at the transmitter. This honiophonic/wire- 
tap approach is a natural candidate for security enhancement, since homophonic (resp. wire-tap) coding is a 
well-known technique to create confusion while performing source (resp. channel coding): on the one hand, 
homophonic coding [5] provides (i) multiple substitutions of a given source vector via randomness so that 
the coded versions of the source vectors appear as realizations of a random source; (ii) recoverability of the 
source vector based on the given codeword without knowledge of the randomization. On the other hand, 

Although it is assumed for simplicity of presentation, the randomness involved does not need to be "perfect" because the 
role of randomness is just to enhance security of the employed keystream generator and not to perform encryption itself. In other 
words, the randomness involved does not need to fulfill all the requirements as the randomness for one-time pad encryption, 
and accordingly implementation issues become substantially simpler. 
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the main goals of wire-tap channel coding (see [Tl] and [T3] when the main channel is error-free) are: (i) 
amplification of the noise difference between the main and wire-tap channel via randomness; (ii) a reliable 
transmission in the main channel and at the same time to provide a total confusion of the wire-tapper who 
observes the communication in the main channel via a noisy channel (wire-tap channel). 

In this paper we assume that transmission occurs via a binary symmetric channel (BSCjl with known 
crossover probability p, so that the receiver obtains 

z = z(k) = y®v = C£;cc(C//(a||u))®x®v (4) 

where v = £ {0, 1}" and Vi is the realization of a binary random variable Vi such that Pr(Vi = 0) = 

1 — p and PT{Vi — 1) = p. Since he knows the private key, the receiver starts with the decryption 

y = (Cbcc (CH(a| I u)) ® X® v) ©X = Cbcc (CH(a| I u)) ® v 

and then decodes 

CH(a||u). 

If the decoding is successful, a is recovered using and the receiver informs the transmitter via a feedback 
link that he could decode. Otherwise he informs the transmitter that retransmission is required. 

In the meantime, the adversary keeps on listening and collects over time samples {z^*)}^^]^ such that 

z^*^ =Ci;cc(C^ff(a(*)||u(*)))®xW®vW , t = l,2,...,T. (5) 

employing the notation a'*) — [af ^'Yi^-^ for the plain text, u^*) = [wf ^'^^ pure randomness used in 
the homophonic encoder, v^*) = for the channel noise, and z^*) = for the received signal. 

The main feature of the dedicated homophonic coding used in [11]- [12] is that the encoding is based 
on randomness and that the legitimate receiver who shares a secret key with the transmitter can perform 
decoding without knowledge of the randomness employed for the encoding, as shown above. As a result, the 
decoding complexity without knowledge of the secret key employed in the system approaches the complexity 
of the exhaustive search for the secret key. The encoding-encryption paradigm for secure and reliable commu- 
nications enjoys the following desirable properties: (i) when the decryption is performed by bitwise XORing 
the keystream to the ciphertext, an error in a bit before decryption causes an error in the corresponding bit 
after decryption, without any error-propagation, and (ii) provides non-availability of the error-free keystream 
when the communication channel is a noisy one. 

2.2 Information-Theoretic Security Evaluation 

It has been shown in jllj and [12] that from an information-theoretic point of view, it is enough to use 
a generic wire-tap encoder to increase the security of the system. Namely, under the assumptions that 
the homophonic encoding matrix Gh defined in ([2]) (i) is invertible, so that the receiver can decode the 
homophonic encoding, and (ii) maps [a||u] so that in the resulting vector each bit of ciphertext z is affected 
by at least one random bit from u, to make sure that each bit of ciphertext is protected, then the homophonic 
(wire-tap channel) encoding increases the security of the private key k under both passive and active attacks, 
where the attacker can modify the ciphertext and learn the effects of these modifications through the feedback 
link. More precisely 

Theorem 1 Let 

=Ci5cc(Cff(A(*)||U(*)))®/(*)(K)+V(*) , t = l,2,...,T, (6) 

be the samples available for cryptanalysis obtained over a period of t times, where A^*^ = U^*^ = 

V(*) = [V"/*^]r=i and Z(*) = [^f^lf^i, are random variables with entropy H{A), H{U), H{V) 
and H{Z), representing respectively the data to be sent, pure randomness involved in the wire-tap encoder, 

more general class of binary channels was discussed in [11] and |12| . 
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random noise, and signals seen by the attacker. Let K &e a binary vector random variable uniformly chosen 
representing the key, such that H(K) — |K|, and /'■-'•'(■), j = 1,2, ...,t, are given deterministic functions 
describing the keystream generator, that is, X*^*^ — /('^(K). When Pr{V^ — 0) ^ Pr(V^"' — 1) ^ 1/2, 
1 = 1,2, n, j — 1, 2, t, we have: 

/or A = [A(i)||A(2)||...||aM] and Z = [Z(i)||Z(2)||...||zM]. 

This theorem statement is interpreted as follows. The uncertainty on the key, measured by the equivoca- 
tion of the key assuming a known plaintext attack implies, an information-theoretic security assuming that 
a limited sample only is available for cryptanalysis. This information-theoretic security appears as a conse- 
quence of the randomness involved via the homophonic coding as well as the communication channel noise. 
It has been shown in |11|-|12) that the randomness introduced via homophonic coding has a heavy impact on 
providing the information-theoretic security. On the other hand, the theorem statement also points out that 
the information-theoretic security does not hold anymore if an attacker can collect an enough large sample 
for cryptanalysis, i.e., if the parameter t is large enough the equivocation reduces to zero which implies that 
the secret key is recoverable assuming availability of certain computational power. Accordingly, as long as 
T < Tthresh, the key is protected by the randomness of the homophonic encoder and of the noisy channel in 
an information-theoretic manner, but that protection cannot last forever if the adversary collects too much 
data. This does not say that it is easy for the adversary to recover the key, as will be discussed below, but 
only that the information-theoretic security does not hold anymore. 

3 Computational Complexity Security Evaluation 

This section analyzes the security of the proposed scheme from a computational complexity point of view in 
the chosen plaintext attacking (CPA) scenario. In this case, the security evaluation consists of establishing 
how hard it is to find the secret key based on the algebraic representation of the encryption. We will show in 
our complexity analysis that the hardness of recovering the key relies on the hardness of the LPN problem 
(see [2], [5], [7], for example). The analysis will pinpoint the features that the homophonic encoder should 
have so as to create an increased complexity of the underlying LPN problem in the average case. 

3.1 Preliminaries 

We consider the scenario where enough large samples {z^*)}^^]^ have been recorded by an attacker, who can 
now try to find the employed secret key k contained in x* = x*^*^ (k) using from ([S]) 

z(*) =Cscc(C^ff(a(*)||u(*)))©xW®vW , t-l,2,...,T, 

since he has a probability of error in recovering the key which now tends to zero. 

For the simplicity of exposition, we assume from now on that |K| ~ n. We further perform the security 
evaluation under the following two assumptions: 

• x^*^ = /(*^(k) = kS*, t = 1,2,..., T, where S — [si.j]"=i "^i, is a binary matrix, and 

S* = [S1*\S«,...,S(*)] (8) 

where each S^*\ i — 1,2, n, denotes a column of the tih power of the matrix S; note that usually 
/'^*^(-) is a heavily nonlinear function, and its consideration as a linear one actually implies a scenario 
for evaluation of a lower bound of the complexity for the secret key recovery; 
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• we consider the chosen plaintext attack where the data is the whole zero vector, i.e. a^*^ = 0, for each 
t. 

Under the above assumptions, and recalling from ^ that both Cecc and Ch are linear encoders, we 
can write 

kS*® [0||uW]G = zW ©v(*), 

from which we obtain an algebraic representation of the recovery problem in terms of a noisy system of linear 
equations, as seen by the adversary: 



" ksl*) " 
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where u*^*) = [iti*'']J^i^ and G^ denotes the zth column of G. 

Remark 1 Note that in the set {[0||u*^*)] G^}"^]^ all the elements could be split into two non-overlapping 
subsets such that a subset contains k linearly independent elements, k at most m ~ £, and the other subset 
contains n ~ k elements each of which is a linear combination of the elements from the first set, since 
[0||u'^*^] G only involves the lower part of G, which is an (m — x n matrix, which has thus at most m — I 
linearly independent columns, and the other columns can be obtained as linear combinations. 

The problem of solving a system of linear equations in the presence of noise is related to the so-called 
LNP problem, defined formally as follows. 

Definition - The LPN Problem. Let (-I-) denote the binary inner product. Let s be a random n-bit 
vector, let e g]0, 1/2[ be a constant noise parameter, let Ber^ be the Bernoulli distribution with parameter e 
(so if 1/ ^ Bere then Pr(i/ = 0) = e and Pi{i/ = 0) = 1 — e), and let As,e be the distribution defined as 

{a ^ {0, 1}"; v ^ Bei; : (a, (s|a) © ly)} 

Let As,e denote an oracle which outputs independent samples according to the above distribution. An 
algorithm A4 is said to {q, t, m, 6')-solve the LPN„.£ problem if 

Pr[s ^ {0, 1}" : M-^-'iV) = s] > 6» , 

and furthermore A4 runs in time at most t, memory at most m, and makes at most q queries to its oracle. 

What the LPN problem captures is that, given a security parameter k, a secret vector x, and gi, gn 
randomly chosen binary vectors of length n = 0{k), it is possible knowing yi — (x|gi) and {gi}"=i to solve 
for X using standard linear-algebraic techniques as long as there is no noise. However, when each yi is flipped 
(independently) with probability p, finding x becomes much more difficult. The problem of learning x in 
this latter case is refereed to as the learning parity in noise (LPN) problem. 

Finally note that the LPN problem is equivalent to the problem of decoding of a general linear block code 
and it is known that this problem is NP-complete [1] , and that relating security of an encryption technique to 
the LPN problem has been employed for security evaluation of certain stream ciphers (see [lU] , for example) . 

3.2 Complexity Evaluation 

A systematic way to solve a system of linear equations, with or without noise, is to perform a Gaussian 
elimination. If the system furthermore contains unknowns that we are not interested in finding, it is natural 
to start by removing them, so as to obtain a system with a smaller number of equations, where only the 
unknowns we would like to find arc left. We will now show how such a strategy changes the noise present in 
the system of equations. 
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Lemma 1 Consider the following system of N equations over the binary field GF{2) to he solved for 
Xi,...,Xl, L < N: 

L 

{^ctj^Xj)®yi = Zi®ei , i = l,2,...,M, 

L M 

(0 "fa;,) e (0 /3fyj) = Zi®ei , i = M + l,M + 2, N , (10) 

where {zi}^i, {a^^^j^L^ and {Pj'^jjLi are known, {xj}j^i, {yj}fLi and {ei}fLi are unknown, and 
each Ci is a realization of a random variable Ei such that Pr(£'j = 1) = p < 1/2, i = 1, 2, N. If 

1. the Hamming weight of each vector \ . ■ ■ greater or equal to some parameter w, for i — 
M+1,M + 2,...,N, 

2. and no ©^^, ^f^^, k € {M + l,M + 2, . . . ,N}, is a linear combination of any other w or less 

®^Li l^j^^Vj' ^ ^ {-^+1) M+2, . .., N}, i.e., there are at least w linearly independent sums 0^1i I^^Pvi 
among those i G {M + 1, . . . , N}, 

then, the problem of recovering the unknown xi,X2, ■ ■ ■ ,xl is the problem of solving the following system of 
equations: 

ML L M 

(0/3fc'(0«f ^^))®(0«5%) = ^^®(0/3i''^fc)®< , i = M + l,M + 2,...,N , (11) 

k=l j=l j=l fc=l 

where Cj is a realization of a random variable Ej such that Pt{Ej = 1) > = ^~^^~^p^ . 

Proof. For every i £ {M + 1, M + 2, N}, adding the following linear combination of the first M equations 

ML M M 

(0/3f(0«f^.))®(0/3l\O=0/3f(^.®e.) , (12) 
fe=i j=i fe=i fe=i 

to the ith equations of the system yields: 

ML L M M 

(0 (0 -f^j)) © (0 = ® (0 4^^^^) e e, © (0 ;3«e.) . (13) 

fe=l j=l j=l k=l k=l 

We are left to compute the probability Pr(iJj* = 1), where 

M 

E*=Ei(B{^pt^Eu), z = M + l,...,iV. 
fe=i 

Since i > M + 1, E^ \s independent of p'^^Ek for every 1 <k, < M. We are thus summing the components 
of the vector 

[E,,E^pt\...,EMp^i^] 

and 

Pr(£;* = l) = 1-Pr(£;*=0) 

M 

= i-Pr(£,e(0/3«i;fc = o)). 
fc=i 
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Now the probability that an even number of digits are 1 in a sequence of + 1 independent binary digits 
is [6l Lemma 1] 

l + (l-2p)^^+i 
2 

if p is the probabihty that every digit is 1. Since 

2 2 : -P / ) 



we have that 



and 



1 + (1 - 2p)^^ ^ ^ 1 + (1 - 2p)^-^+i 



M 

Vy{e:^\) = 1 -Pr(i?,e(0 =0)) 

k=l 

^ 1 + (1 - 2p)^+i 
> 2 
1 - (1 - 2p)"'+i 
2 

since by assumption 1., the weight of each vector of [/3[^^ , . . . /3^^^] is at least w, and according to the assumption 
2., there is no linear combination of the equations which can reduce the corruption noise value lower bounded 
by Pw (i.e., it cannot be reduced via any further linear processing of the system of equations). H 

This leads to the main evaluation result: 

Theorem 2 The complexity of recovering the secret key k based on the algebraic representation of the scheme 

is lower bounded by the complexity of solving the LPNn^^ problem where, e = - — ^^-^ and n, w and p 

are the parameters of the scheme, representing resp. the length of the key, a parameter of the homophonic 
encoder and the probability of the BSC. 

Proof. From we have the following system of rn overdefined consistent but probabilistic equations over 
the binary field GF(2): 

kS^*^ ® [0||u(*)]Gi = 

kS^'^ © [0||uW]G2 = zf©!^^ 

. , t = l,2,...,T , (14) 

kS^*^ © [0||u(*)]G„ = zi'^Ovi''^ 
where each equation is correct with probability equal to p, is a ^-dimensional vector of all zeroes, and 

The above system of equations fits the setting of Lemma [l] since we have N — rn equations, for 
L = n unknown, where ct'j'^ Xj, k — 1, . . . ,N correspond to kS|*\ i = 1, . . . , n, i = 1, . . . , r, and y^, 

i = 1,...,M together with ®'^L-^l3f\j for A: = M + 1, . . . , TV correspond to [0||u(*)]Gi, i = 1,...,M, 
t = 1, . . . ,T, since according to Remark [1] we can indeed separate the {[0| ju^'^JGi}"^]^ for every t into one 
set of linear independent vectors, and another set which is obtained as linear combinations of the first set 
(M is then rfc, where k is at most m — I). 

Note that the above system of rn equations contains only n + r(m — £) unknown variables, and that our 
goal is to recover k only, i.e., we do not have any interest in recovering {m^^'IJ^^^' = ^i^, ...,r. Thus, via 
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Gaussian elimination, we can remove the r(m — I) unknown {w^*''}™'^^, t — 1, 2, r, and obtain T{n — m + £) 
equations where only k is unknown. This transforms the initial system of rn equations into the following 
one with T{n — m — £) equations (in total) and n unknowns k: 



i = l,2,...,r , (15) 



'-n-m.+iy^l ^ '-'n-m+l\y^i \i=l) ^ '-n-m+t\i"i \i=ll 

where £f\-), cf\-) and c'f\-),j = 1,2,... , n — m+£, are linear functions, all of them specified by the matrix 

G and the Gaussian elimination used to remove the random bits u(*\ while further depends on the 

matrix S*. Note that the Gaussian elimination of the variables {Uj-*-*}™"^^, can be performed independently 
for each t, implying that the entire complexity (for t = 1,2,...,t) is upper-bonded by TO{n^'^) assuming 
employment of the most efficient algorithm for the Gaussian processing. 

Lemma [T] and its underlying assumptions provide that each equation in (jisp is correct with some proba- 
bility lower than \ — p^, where 

_ 1 - (1 - 2pY"+^ 
Pw - ^ , 

since the noise (v*)W = 4"H[«f ^]r=i)> • ■ • > = ^n-m+iii^^*^]?=l) has coefiicients that are the 

realization of a random variable which takes value 1 with probability greater than — — ^ . The 

above system of r(n — m + £) equations can consequently be rewritten as: 

cU[h]?=^) = /:l^^([4'']r=i) 

^n-m+e+li[ki]i=l) = ^1 ii^i ]i=l) 

^n-m+i+2iiki\i=l) ~ ^1 ([^i \i=\) 

r* (\k]'^ ^ — r^^^ (\7^'^h^ \ 

'^T(n-m+l)\y'^^\i=l> " *-'n~m+e.\['^i \i=l> 

where each equation is incorrect with probability greater than py, = - — , and where j^j, j = 

1, 2, T{n ~ m + i)^ are linear functions. 
We finally get: 

(k|ci) = di 

(k|c2) = d2 



(k|c„_m+f+l) — djn-n+e+l 
(k|c„_m+f+2) — dm-n+l+2 



(k|c.r(n-m+£)) — dr{m.-n+t.) 



(17) 



where each equation is correct with a probability upper-bounded by 1 — = 1 — - — , and where 

};ir"+^) and 



the n-dimensional binary vectors {cj}^£" m+e) {rfjl^l" m+e) known. 
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According to the definition of tlie LPN problem and the above representation, the problem of recovering 

the secret key is at least as hard as the LPN„^e problem with e — ^ which concludes the proof of 

the theorem. ■ 



4 Homophonic Encoder Design Criteria 

From the above computational security evaluation, it is clear that the design of the homophonic encoder 
influences the computational complexity of cryptanalysis. In this section, we explicit code design criteria for 
homophonic coding, taking into account not only the computational and information theoretical security, 
but also the implementation complexity. Requirements can be expressed either as a function of Gu given 
G_Ecc, or as a joint function of Gh and Gbcc- 

Indeed, the latter holds in the case of a design of the encoding-encryption system from scratch, where the 
design should include a coding box which performs the concatenation of homophonic and error-correction 
coding in a manner which fits the rate of the concatenated code to the given constraints. The former on 
the other hand applies when upgrading existing systems employing the encoding-encryption paradigm, in 
which case, the implementation assumption is that the employed, already existing, binary linear block error- 
correction code (171,71) which encodes 771 bits into a codeword from GF{2"), could be replaced with a binary 
block code (m', 71) with the same error correction capability but with 771' > 771. Accordingly, m' — m random 
bits can be concatenated with 771 information bits and mapped into the new m'-bits via a homophonic 
encoder. The obtained output from the homophonic encoder is the input for the error-correcting one. 

We recall first that the basic requirements on the matrix Gjj, as far as information theoretical security 
is concerned, are [11] [12] : 

• Invertibility. The matrix Gh should be an invertible matrix, so that the receiver can decode the 
homophonic encoding. 

• Security. The matrix Gh should map [a||u] so that in the resulting vector each bit of data from a 
is affected by at least one random bit from u (to provide a background that each bit of cipheretext is 
affected by at least one bit of u). 

4.1 Computational complexity design criteria 

Recall from ([3|) that 

a| \vl\G hG Ecc 
= [a||u]G 

where G = [gi^jYiLij^i is an m x n matrix containing both the homophonic and the error correction encoding. 

The basic design requirements for a suitable homophonic encoder, i.e., the matrix Gr, are pointed 
out above, and this section contains additional guidelines to design a dedicated homophonic encoding which 
provides maximum complexity of the underlying LPN problem for given implementation and communications 
overhead. 

It is well known that the hardness of the LPNn^g problem in the average case, heavily depends on the 
parameter e (see [5] and [7], for example). On the other hand. Theorem 2 implies that the parameter e 
depends on the minimal value of the basic equations which should be linearly combined in order to eliminate 
the random variables from each equation of the system. Accordingly, Theorem 2 implies the following design 
criteria for construction of the matrix Gh- 
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• Weight. For a given error-correcting code generator matrix Gecc, specify the homophonic code 
matrix Gh so that the resuhing matrix Gh = [g^j satisfies: 

m-l 

J2 9i+Ln-i+j > «^ ' j ^l,m-£ + 2,...,l , 

where w is a parameter. 

• Dependability. According to ([T ^ - (|16p . the sub-matrix of the matrix G consisting of its to — ^ last 
rows should be such that any of the columns is a linear combination of at least w other columns. 
Consider thus the sub-matrix G* determined by the rows to — ^-|-l,m — £-|-2,...,m and columns 
1, 2, . . . , n of the matrix G. We require that no column of the matrix G* is equal to a linear combination 
of w or less other columns of G*. This can be rephrased by asking 

rank(G*) > w-l- 1. 



4.2 Implementation design criteria 

On the sender side, both the homophonic and error-correcting encodings are performed via a single vector- 
matrix multiplication employing the matrix G = GhGcee- 

On the receiver side the error-correction decoding and the homophonic decoding should be performed 
independently. First the errors should be corrected by the error-correction decoding, because the homophonic 
decoding requires error- free decoding input. 

This implies that in order to minimize the implementation complexity, a desirable property is sparseness 
of the related matrices Gh, G and G^^. 

• Sparsity. For a given error-correcting code generator matrix G ecCi a-nd a given security parameter w, 
specify the homophonic encoding matrix Gh in such a manner that either it is sparse or the resulting 
matrix G is sparse to provide minimization of the implementation complexity on the sender side, and 
at the same time the matrix G^^ is sparse in order to avoid too high computation overhead for the 
receiver. 



5 Homophonic Code Constructions 

Let us first write the to x to wire-tap matrix Gh and the m x n error-correcting matrix Gecc as 



Gh = 



.(4) 
'H 



ECC 



G 
G 



(1) 

ECC 
(2) 

ECC 



(18) 



where G^"* is an ^ x (to, — I) matrix, G^'' ia an I x I matrix, Im-i denotes the (to — I) x (to — I) identity 
matrix, G^"* is an (to — I) x I matrix written as 



.(2) 



iH) (H) 

9l+l,m-l+l 9e+l m-i+2 

(H) (H) 

9t+2,m-t+l 9e+2,m-i+2 



(H) (H) 
9m,m-e+l 9m,m-l+2 



9t+l, 

(H) 
9l+2,: 
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G^ECC an (m — Z) X n matrix and finally G'^^qq is an Z x n matrix, so that 



(2) 



G 



GhGecc 



i(i) 

'H 



G 



(2) 
H 
(4) 



Im-; Gjj 



p(i) 

p(2) 
^ECC 



,^(2)^(2) 
'^ff '^SCC ^ECC 

r.(i) , r.(4)p,(2) 

^ECC '^i? ^ECC 



5.1 A generic construction 

We now give a general construction method for the matrix Gh- Choose first 



G 



(1) 

H 



I/. 



Let us check that we already satisfy the information theoretical requirements. 

• Invertibility. Since Gh is a square matrix, we can rephrase its invertibility using its determinant by 
asking 

det(Gi/) ^ 0. 
Using Schur complement, this is equivalent to 

dot(G(^^-G(J'G(^')^0. 



The above choice of G^^ and G^'' gives det(I;) ^ which always holds, so that the invertibility 
condition is taken care of. 

• Security. The matrix Gh should map [a||u] so that in the resulting vector each bit of data from a is 
affected by at least one random bit from u. Since 



i(2) 



a u 



Ojx(m-0 
Im-i 



I; 



G 



(4) 
H 



uG 



(4)1 
H J' 



it is enough that G^^ has no column with only zeroes to get that indeed each bit of data from a is 

affected by at least one random bit from u. 

We next look at the conditions coming from computational security. The weight condition can be 
rephrased as requiring that each column of G^^^ has Hamming weight at least w, which automatically 

makes sure that G^' has no column with only zeroes. 

The dependability condition relates to the sub-matrix G* determined by the rows m.—£+l, m.—£+2, . . . , m 
and columns 1, 2, . . . , n of the matrix G. Since m — I counts the number of random bits, it is reasonable to 
assume that 

m — I < I <^=> m < 21, 
that is we use at most as many random bits as data bits. Since 



G = GhG 



H^ECC 



I) 



lx{m — l) 



G 



(4) 
H 



p(l) 
^ECC 
p(2) 
^ECC 



G 



^ECC 



(2) 

ECC 

G^^^G^^) 



ECC 



with, assuming w.l.o.g that Gecc is in systematic form, 



G 



(1) 

ECC 



(ECC) (ECC) 

9i,i 91.2 

(ECC) (ECC) 

•92,1 52,2 



lECC) 



(ECC) 

?TO-«,2 



(ECC) 

91, n 
(ECC) 

92, n 



(ECC) 
9m-e,n 



= [ I. 



I 0(m-l)xl 



(m—l)x(n—m) 
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and 



G 



(2) 

ECC 



{ECC) (ECC) 

9m-e+l,l 9m-e+1.2 

(ECC) (ECC) ' 

9m-l+2,l 9m-i+2,2 



JECC) 



JECC) 
ym,2 



(ECC) 
9m-e+l.n 

(ECC) 
9m-l+2,n 



JECC) 



[0 



lx(m-l) M 



1/ Qix(n — r 



we can write the I x n matrix G* as 



G* = 



(ECC) (ECC) 
92m-2l+l,l 92m-2l+l,2 



JECC) 
9m,l 



ECC 



JECC) 
9m,2 

r'(4)p,(2) 

^ECC 



(ECC) 
92m-2l + l,: 



JECC) 







Now 



so that finaUy 



^ECC — ^Wl^ly^im-l) I; Q/x(n-m)] — [Om-i G^'' G\j' Qi 

X (n—m)\ 



(2l-m)x(2m-3l) 



.(4) 



l2;-m 0(2;-m)xi 
C^W 1/^(4)^,(2) 
^ECC ' ^ECC 



(2l~m) X (n—m) 



(4) M^), 



G* 



0(2J-m)x(2m-3;) I2/ 

i(4) 



'-m — l H ^lx(7i—m)^'^(in — l)x(n—m) 

The requirement is that 

rank(G*) >w+l. 

Since I < m < n, the rank of G* is at most Z, and it is enough to look at the rank of the I x m submatrix 

0(2/-m)x(2m-3;) ^2l-m ^(2l-m)xl 



(21 — m) X (n — m) 



(4) 



p..; (19) 

which varies from m — I to I since the first m — I cohimns are Hnearly independent. Thus if w + 1 < m — I, 
the dependency condition is satisfied naturally. Otherwise, we need to build G^'' such that k of its columns, 
k = 1, . . . , 2/ — TO are linearly independent from the m — I first columns of the above matrix. To do so, it is 
enough to consider the 21 — m first columns of G^'' , and we consider the truncated matrix (|19p 

0(2;-m)x(2m-3/) ^2l-m 0(2/-m)x2;-m 

where A contains the 21 — m first columns of G}^ . Let us further write 



A = 



Ai 

A2 



where Ai is a {2m — 31) x (21 — to) matrix, and A2 is a square 21 ^ m matrix. To control the rank of G*, 
we set Ai = and we get 

rank(G*) = m - I + k 

where k is the number of columns of A2 which are linearly independent from the the matrix 



0(2/-m)x(2m-3;) T-2l-r 



(20) 



Setting A2 to zero makes this computation easier. Indeed, to get k such columns, it is enough to pick k 
columns from the 21 — m identity matrix. This might give some columns with zero or very few ones, which 
looks like contradicting the weight condition. However, columns with higher Hamming weight can be easily 
obtained by taking linear combinations of the columns which will not change the rank. 
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• Dependability and Weight. Since 



rank(G*) = m - I + k 

where k is the number of columns of the sub-matrix of G^'* formed by taking its first 2m — I columns 
and last 2m — I rows which are linearly independent from (j20p . it is enough to ask for 

k>'w + l + l — m. 

To ensure that each column of G^'' has Hamming weight w, it is enough to take linear combinations 
of the columns. 



Sparsity. The choice of G)^ ~ 



(2) 

iix(m-i) s-nd G)j = I; makes the I first rows of Gh as sparse as 
possible, since removing any zero would make the matrix non-invertible anymore. Furthermore, the 
way the dependability condition is constructed is also optimal in the sense that it starts with the least 
number of 1 to get the wanted rank, and then obtains the desired Hamming weight of each column by 
linear combinations. 



5.2 Examples of Constructions 

Take m — 21 so that m ~ I — I, and 



'H 



T, r^") 





' Oi Ii 







The matrix Gh is clearly invertible. The Hamming weight of each column of G^'' is 1, thus w must be 
taken to be or 1. The matrix G)^ is chosen to be zero for increasing the sparsity of Gh- 
As a toy example, let us consider the (7, 4) Hamming code with 



G 



ECC 



and 



'H 



/I 











1 


1 


M 





1 








1 





1 








1 








1 


1 


V 








1 


1 


1 


1 / 




/ 





1 






















1 










1 





1 













I 


1 





1 


) 







We have that 



The matrix G* is thus 



GhGecc 



G* 



^ 1 1 1 ^ 

1 1 1 1 

10 10 10 1 

\ 1 1 1 / 



10 10 10 1 
10 10 10 



Since the requirement is that the rank of G* is at least w, this is clearly satisfied here since w = 1. Note 
that 

/ 1 1 \ 



'H 



10 1 
10 

V 1 y 
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Gh = 



and the cost of encoding and decoding the homophonic code is the same. 
In order to increase w, we could alternatively take 

1 \ 
1 
10 11 

V 1 1 1 y 

for which w can be taken to be 2. Then, continuing with the (7,4) Hamming code, we get 

1^ 1 1 1 \ 
1 1 1 1 
10 110 10 



GhGecc — 



\ 1 1 1 1 / 



where 

G* = 

has rank w = 2 as required. This time 



10 110 10 
1110 1 



p 1 1 0\ 
110 1 
1 ' 
\ 1 / 

and as expected, increasing w correspondingly decreases the sparsity of Gh and its inverse. 



6 Conclusion 

The paper addresses the problem of design of a homophonic code for certain enhanced encoding-encryption 
based communication systems, reported and analyzed from information-theoretic point of view in |llj-|12). 
The design is based on the guidelines implied by security evaluation from the computational complexity 
point of view. Accordingly, this paper yields the following: (i) security evaluation of the considered system 
from the computational complexity point of view; (ii) guidelines for design of a dedicated homophonic coding 
implied by the performed security evaluation; (iii) proposal of a dedicated homophonic code which provides 
the desired level of security and at the same time provides low implementation overhead. 

The security evaluation of the employed encryption is considered by hardness of recovering the secret key 
based on the algebraic representation of the encryption in CPA scenario. It is shown that the addressed secret 
key recovery is at least as hard as the LPN problem when, assuming an appropriate design, the corrupting 

noise is e = - — and p < 0.5 and w are the system parameters. Note that the in the average 

complexity consideration, the LPN problem corresponding to the parameter e is much harder than the one 
with the parameter p. Accordingly, assuming that the parameters of the scheme are appropriately selected, 
the complexity of the secret key recovery based on the algebraic representation appears approximately as 
hard as the exhaustive search over all possible secret keys. 

The results of security evaluation are considered as guidelines for design of a dedicated homophonic 
encoder which provides a desired security level and minimize the implementation complexity. Assuming 
that the homophonic code should be a linear one, beside the basic requirement on the invertibility and 
the mixing properties of the generator matrix, the following three additional criteria are pointed out and 
specified in Sections 4.1 and 4.2: (a) Weight on columns of the generator matrix; (b) rank of the generator 
matrix; (c) sparsity of the generator matrix. The criteria (a) and (b) appear as an implication of the security 
requirements, and the criterion (c) is related to minimization of the implementation overhead. The previous 
design criteria are employed for design of a dedicated homophonic code. A generic design of the homophonic 
coding dedicated to the considered security enhanced communication system is proposed and it is shown 
that the design fulfills all the given criteria. 



15 



Acknowledgments 



The research of F. Oggier is supported in part by the Singapore National Research Foundation under Research 
Grant NRF-RF2009-07 and NRF-CRP2-2007-03, and in part by the Nanyang Technological University under 
Research Grant M581 10049 and M581 10070. This work was done partly while M. Mihaljevic was visiting the 
division of mathematical sciences, Nanyang Technological University, Singapore, and partly while F. Oggier 
was visiting the Research Center for Information Security, Tokyo. M. Mihaljevic is partly supported via the 
Project # 174008. 



References 

[1] E.R. Berlekamp, R.J. McEliece, and H.C.A. van Tilborg, "On the Inherent Intractability of Certain 
Coding Problems", IEEE Trans. Info. Theory, vol. 24, pp. 384-386, 1978. 

[2] A. Blum, A. Kalai and H. Wasserman, "Noise- Tolerant Learning, the Parity Problem, and the Statistical 
Query Model", Journal of the ACM, vol. 50, no. 4, pp. 506-519, July 2003. 

[3] GSM Technical Specifications: European Telecommunications Standards Institute (ETSI), Digital cel- 
lular telecommunications system (Phase 2+); Physical layer on the radio path; General description, TS 
100 573 (GSM 05.01), http://www.etsi.org, 

[4] GSM Technical Specifications: European Telecommunications Standards Institute (ETSI), Digi- 
tal cellular telecommunications system (Phase 2+); Channel Coding, TS 100 909 (GSM 05.03), 
http: / /www. etsi.org 

[5] M. Fossorier, M.J. Mihaljevic, H. Imai, Y. Cui and K. Matsuura, "An Algorithm for Solving the LPN 
Problem and its Application to Security Evaluation of the HE Protocols for RFID Authentication" , 
INDOCRYPT 2006, Lecture Notes m Computer Science, vol. 4329, pp. 48-62, Dec. 2006. 

[6] R. G. Gallager, "Low-density parity-check codes," IRE Trans. Inf. Theory, vol. IT-8, no. 1, pp. 21-28, 
Jan. 1968. 

[7] E. Levieil and P.- A. Fouque, "An Improved LPN Algorithm", SCN 2006, Lecture Notes in Computer 
Science, vol. 4116, pp. 348-359, 2006. 

[8] H.N. Jendal, Y.J.B. Kuhn, and J.L. Massey, "An information-theoretic treatment of homophonic sub- 
stitution", EUROCRYPT'89, Lecture Notes in Computer Science, vol. 434, pp. 382-394, 1990. 

[9] J. Massey, "Some Applications of Source Coding in Cryptography" , European Transactions on Telecom- 
munications, vol. 5, pp. 421-429, July-August 1994. 

[10] M.J. Mihaljevic and H. Imai, "An approach for stream ciphers design based on joint computing over 
random and secret data". Computing, vol. 85, no. 1-2, pp. 153-168, June 2009. 

[11] M.J. Mihaljevic and F. Oggier, "A Wire-tap Approach to Enhance Security in Communication Systems 
using the Encoding- Encryption Paradigm", 2010 IEEE 17th Int. Conf. on Telecommunications - ICT 
2010, Proceedings, pp. 484-489, April 2010. 

[12] F. Oggier and M.J. Mihaljevic, "An Information-Theoretic Analysis of the Security of Communication 
Systems Employing the Encoding-Encryption Paradigm", available as CoRR abs/1008.0968, Aug. 2010. 

[13] A. Thangaraj, S. Dihidar, A.R. Calderbank, S.W. McLaughlin, and J.-M. Merolla, "Apphcations of 
LDPC Codes to the Wiretap Channel", IEEE Trans. Information Theory, vol. 53, no. 8, pp. 2933-2945, 
August 2007 . 

[14] A.D. Wyner, "The wire-tap channel". Bell Systems Technical Journal, vol. 54, pp. 1355-1387, Oct. 1975. 



16 



